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Preface 



Now that I have compiled these proceedings, it is a great pleasure to thank all 
involved. 

The first thanks go to the scientific community interested in hiding informa- 
tion or in stopping other people doing this. At the initiative of Ross Anderson 
in 1995, we came together for the first international meeting at IH’96 in Cam- 
bridge, and subsequently met at IH’98 in Portland. Our community now consists 
of about 200 people collaborating around the world - and making remarkable 
progress. It is our common conviction that in the long run, much more security is 
achieved by open discussion and public selection of mechanisms and implementa- 
tions than by ’’security by obscurity”. This is especially true for large commercial 
systems, and it is most probably also true within the field of information hiding. 
Trying to hide the design and implementation of hiding mechanisms may be 
particularly tempting - since hiding is an issue anyway. But as shown by the 
breaks of quite a few digital copy- protection systems within the last few years, 
’’security by obscurity” may prove not to be commercially viable, at least not in 
civil societies. 

The scientific community submitted 68 papers to this conference IH’99. This 
was many more papers than we had expected, and they were of much higher 
quality than we had dared to hope for. Many thanks to all authors. 

To cope with this situation, the program committee, consisting of Ross An- 
derson (Cambridge University), David Aucsmith (Intel, Portland, OR), Jean- 
Paul Linnartz (Philips Research, Eindhoven), Steve Low (University of Mel- 
bourne), Ira Moskowitz (US Naval Research Laboratory), Jean-Jacques 
Quisquater (Universite catholique de Louvain), Michael Waidner (IBM Research, 
Zurich), and me, decided to ask additional experts to help in the review of pa- 
pers. 

We got reviews by Christian Cachin (IBM Research, Zurich), LiWu Chang 
(US Naval Research Laboratory), Elke Franz (Dresden Univ. of Technology) , Ton 
Kalker (Philips Research, Eindhoven), Herbert Klimant (Dresden Univ. of Tech- 
nology), Markus Kuhn (Cambridge University), Peter Lenoir (Philips Research, 
Eindhoven), Thomas Mittelholzer (IBM Research, Zurich), Luke O’Connor (IBM 
Research, Zurich), Fabien Petitcolas (Cambridge University), Ahmad- Reza 
Sadeghi (Univ. des Saarlandes), Andreas Westfeld (Dresden Univ. of Technol- 
ogy), and Jan Zollner (Dresden Univ. of Technology). Thanks to all program 
committee members and reviewers who between them contributed over 200 re- 
views, which I batched and delivered in anonymized form to the whole program 
committee. (Special thanks go to Ross Anderson for handling all reviews of pa- 
pers of which I was one of the authors.) 

Due to the space limitations of a three day, single stream workshop, the 
program committee could only accept 33 papers to allow speaker slots of 30 
minutes. This meant we had - regrettably - to reject some papers which deserved 
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acceptance. As a result, we did not provide space for an invited talk this year. 
To open the floor to additional ideas, we did arrange a rump session. 

Within the program committee, we had quite a few discussions on the merits 
of borderline papers, but in the end, we achieved a consensus on the program. 
Many thanks to all members of the committee; it was a pleasure to work with 
you. It was an achievement that, in spite of a very tight schedule and many more 
papers than expected, we managed to finish the job and to provide feedback to 
all the authors three days before schedule. 

IH’99 would have never become a reality without the organizational help 
of my secretary Martina Gersonde, who handled everything to do with accom- 
modation, registration, printing the pre-proceedings, and organizing the various 
social events. During her holidays, Anja Jerichow stepped in. They and Kerstin 
Aclrtrutlr provided all sorts of services during the workshop. Petra Humann and 
Andreas Westfeld provided IT support both around and during the workshop. 
Hannes Federratlr, being the art director of our institute in his spare time, han- 
dled all issues concerning our website and added the flavor and style to our basic 
functionality. As all preparation for the workshop was done completely online to 
avoid the costs of printing and mailing, this was especially valuable. 

At this year’s information hiding workshop, watermarking was the big dom- 
inating theme - at least for industry. At IH’96 and IH’98, we had a much more 
balanced mixture of the different fields of information hiding. I hope this will 
be the case again for IH’01, wherever it will take place. IH’99 could be called 
the ’’Workshop on Watermarking Resistant to Common Lossy Compression”. 
We now know fairly well how to achieve this, but have more or less no idea how 
to achieve real security against well targeted attacks on watermarks. Industry’s 
hope of copy protection by watermarking either needs a real scientific break- 
through - which I do not expect since there are so many kinds of slight changes 
an un-marking tool might make after the watermark has been embedded - or 
a more realistic perspective: systems that use copyright registration as the pri- 
mary control mechanism and watermarking only as a secondary means to help 
keep honest people honest. If this is not commercially viable, then other means 
are needed to reward content providers than giving them the illusion of copy 
control. Perhaps as a researcher outside of industry, it falls to me to say this so 
frankly. 



November 1999 



Andreas Pfitzmann 
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An Information-Theoretic Approach to 
Steganography and Watermarking 



Thomas Mittelholzer* 

IBM Zurich Research Laboratory 
Saumerstrasse 4, CH-8803 Rueschlikon, Switzerland 
tmi@zur ich . ibm . com 



Abstract. A theoretical model for steganography and digital water- 
marking is presented, which includes a stego encoder, a stego channel 
and a stego decoder. The first part states the basic steganographic and 
watermarking problems in terms of mutual information of the involved 
quantities such as the secret message, the stego message and the modified 
stego message. General lower bounds on the robustness-related mutual 
information are derived. In the second part, perfect steganography is 
considered and some new schemes are presented that achieve perfect se- 
crecy and provide robustness against some attacks. In the last part, the 
robustness of some simplistic schemes is evaluated by tight lower bounds 
on the robustness-related mutual information. From these bounds, two 
criteria for robust embedding are derived. 



1 Introduction 

With the use of the Internet for the distribution of multimedia data, steganogra- 
phy has become a topic of growing interest. A number of programs for embedding 
hidden messages in images and audio files are available and the robustness of such 
embeddings is a controversial issue. Steganography is still in an experimental 
phase and no general theory is available that states the theoretical possibilities 
and limits. 

In this paper, we propose an information theoretic approach to steganogra- 
phy. The basis for this approach is a model that characterizes the embedding 
process and the attacker’s modification of the steganogram. Other authors have 
proposed a different information theoretic approach to steganography [1], [2], 
which is based on hypothesis testing and not on mutual information. In [3], a 
criterion for perfect steganography is considered, which relies on mutual infor- 
mation as in this paper. The approach is promising, but the robustness issue of 
digital watermarking is not addressed. 

The goal of this paper is to present a general model, which allows one to 
study two basic issues of a steganography scheme. The first issue is the perfect 

* Most of this work has been done while the author was with Digital Copyright Tech- 
nologies in Zurich. This work was supported by the latter and the Swiss National 
Science Foundation under grant No. 5003 — 045334 (KryPict). 

A. Pfitzmann (Ed.): IH’99, LNCS 1768, pp. 1-16, 2000. 

(c) Springer- Verlag Berlin Heidelberg 2000 
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secrecy of the embedded message. This notion of a perfect steganographic system 
is formulated and examples of perfect steganographic systems are presented. We 
show also that a certain spread-spectrum-like scheme can give perfect stego 
systems. This system has similar robustness properties against attacks as other 
stego systems based on spread-spectrum techniques (cf. [4]). 

The second basic issue is the robustness of the embedding. This issue is 
treated by making a suitable definition of a stego channel and by using mutual 
information as a measure for robustness. General bounds for the robustness- 
related mutual information are derived and, in the case of Gaussian signals, 
explicit formulas and tight bounds are given. 

2 A Model for Steganography and Digital Watermarking 

A stego system is used to transmit secret information V from a sender, say Alice, 
to a receiver, Bob, in such a way that an intermediate party, Eve, is not able to 
notice that the stego message X contains hidden secret information. If Eve has 
control over the channel, Eve can modify a suspect message X and transform it 
into a modified version Y. If such modification attacks might occur, the stego 
system should be robust against small distortions in the sense that the secret 
information will still reach Bob. 

The proposed stego model is intended to give a framework for digital water- 
marking. Compared to Simmons’ work [5], different criteria are used to charac- 
terize a valid stego message/image and possible attacks on stego messages. In 
the digital watermarking framework, unlike in the Prisoners’ Problem, the inter- 
mediate party cannot check whether a watermarked image was formed according 
to preset rules, e.g., satisfying an authentication scheme; the only requirement 
is that the watermark is unnoticeable. When attacking the stego message, Eve 
has a different goal than Simmons’ warden because she just wants to destroy 
the secret message (digital watermark) while maintaining an acceptable quality 
of the resulting modified image. 

2.1 The Model 

In a steganographic scheme, a secret message V is hidden with some cover data. 
The embedding of the secret message is done by the stego encoder which, depend- 
ing on some secret key K merges the secret message V into the cover data U, 
which is a sequence of random variables. For each key value k, the stego en- 
coder /]<.(.,.) produces the stego message X = /^(U, V), which is again a se- 
quence of random variables (cf. Fig. 1). It is assumed that the encoder has an 
encoder inverse g^(., .), i.e., <7k(/k(U, V),U) = V. The stego message should 
look genuine, i.e., the stego message should not be distinguishable from a typical 
message of the message source. A possible way to impose this condition math- 
ematically is by choosing a suitable distortion measure d(., .) and by requiring 
for every key value k the encoder constraint 



Ed( U,X) < <5 



(1) 
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where the expectation operator E is with respect to the joint probability dis- 
tribution on U and X. The bound S is chosen suitably small to guarantee that 
the stego message X is essentially indistinguishable from the cover data U. 
The secret key K, which is used for the embedding, is usually assumed to be a 
vector of statistically independent and uniformly distributed random variables. 
A commonly used distortion measure is the squared error distortion d(x, y) = 
1/n ' Si=i n ( x i — Vi) 2 f° r vectors of length n. 




Fig. 1. Model for Stego System 



In digital watermarking applications the embedding of the secret message 
should be robust in addition to satisfying the encoder constraint (1). This ro- 
bustness requirement can be modelled by a stego channel in the following way. 
The attacker (i.e., the crook) is allowed to modify the stego message only in a 
limited way for otherwise the quality of the message will suffer too much, i.e., a 
distorted image Y after an attack must still have a reasonable quality. This qual- 
ity requirement, which will be called the channel constraint , can be expressed 
as 

Ed{X, Y)<£ (2) 

where the expectation is with respect to the joint probability distribution on 
X and Y. This implies that the distorted image Y must be close to the stego 
image X for small e. The channel constraint can also be generalized to include 
geometric transformations by the attacker. For example, to cope with orthogonal 
transformations of the image, one would impose the constraint 

min Fd(X, r(Y)) < e (3) 

where the minimum is taken over all orthogonal transformations r. 

The stego channel is different from the channel models that are commonly 
used in communications [6], [7], such as the discrete memoryless channel. A 
stego channel can be much more general, e.g., an additive noise channel or a 
channel that performs compression. The channel constraint (2) specifies a whole 
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class of channels, namely, all possible channels that satisfy (2). By choosing a 
particular attack, the attacker selects (or creates) one fixed channel from this 
class of channels. 

In Fig. 1 it is implicitly assumed that the attacker has no knowledge of the 
secret message V, the secret key K or the source message U other than that 
contained in her observation X. For instance, if she knew U, she could set Y = U 
and deceive the receiver. The restricted knowledge of the attacker can be stated 
explicitly by saying that there is the following Markov chain of random variables 

(^j K, U) — > X — > Y. (4) 

The goal of the decoder is to reconstruct the secret message V from the 
received (distorted) message Y using the secret key K. In many cases, it is 
assumed that the decoder has no knowledge of the cover message U, but there 
are applications where the decoder has access to the cover message and this 
knowledge can help in the decoding process. 

It is important to note that in the model shown in Fig. 1 the cover mes- 
sage source must produce a cover message with a certain kind of uncertainty. A 
Gaussian source of independent random variables would be a good source, but 
an image source, which only produces images of typewritten text would be a bad 
source. For such a text-image source, one could feed the image to a text scanner 
and reproduce the original text with high probability. This text-image source is 
bad because there is enough redundancy to allow almost perfect reconstruction 
(on the text level). A good cover message source produces little redundancy or, 
equivalently, much uncertainty remains even if parts of the message are known. 



2.2 Basic Issues: Secrecy and Robustness 

A watermark scheme is a stego system, where the digital watermark is the secret 
information V and typically contains a small amount of information. To be of 
practical use, the embedding must be extremely robust against modifications of 
the stego message, e.g., image watermarking should prevent a crook, like Eve, 
from removing the watermark by making a modified image Y, which is of good 
enough quality to be resold. 

From an information theoretic viewpoint the two basic issues, secrecy and 
robustness, can be characterized using the notion of mutual information (cf. [8]). 
The stego encoder should be designed such that for given encoder and channel 
constraints (1) and (2) 

(i) there message V as possible, i.e., the mutual information /(V;X) be- 
tween V and X is minimized, and 

(ii) the stego message is robust in the sense that - given the secret key K 
the mutual information I(V; Y|K) between the modified stego message 
Y and the embedded secret message V is maximized over the class of 
allowed channels. 
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If the cover message U is known, then the above conditions should be adapted 
as follows: (*') if Eve knows U, then J(U;X|U) should be minimized; (ii 1 ) if Bob 
knows U then I(V\ Y|U, K) should be maximized. It is important to note that if 
Eve knows U, she can completely remove the watermark by setting Y = U. Thus, 
in a watermarking application, one must ensure that Eve never gets possession 
of the cover message. 

If I(U;X) = 0 [or /(V;X|U) = 0 when Eve knows U] then Eve obtains no 
information whatsoever about the secret message. Such a system will be called a 
perfect stego system. In the watermark context, the minimum value of J(V; Y|K) 
[or I(V ; Y|U, K) when Bob knows U] should be about 1 bit because one needs 
at least about one bit to make a reliable decision about whether Y contains a 
digital watermark or not. More generally, one needs about H(V) bits to read 
the secret message V. 

It is important to note that even if /(U;Y|K) is sufficiently large, the task 
of the actual extraction of the secret information V from Y might be difficult. 
For a stego or watermarking scheme to be practical, this decoding process must 
be reasonably simple. 

If one wishes to use the more general channel constraint (3) instead of (2), 
one should consider max I(V; r(Y)|K) over all orthogonal transformations and 
design the encoder to maximize this expression. 

2.3 Basic Bounds 

Using the stego model given in Section 2.1, one can obtain some useful and intu- 
itively appealing bounds on the basic mutual information quantities considered 
in Section 2.2. We make no restriction on the nature (discrete or continuous) of 
the considered random variables but we do assume that in the continuous case 
the considered entropies and mutual informations are well defined. 

Proposition 1. Let V, U, K, X and Y be (vector) random variables denoting 
the secret message, the cover message, the secret key, the stego message and the 
modified stego message, respectively, which are related by the stego model shown 
in Fig. 1. Then, the following statements hold: 

(i) The mutual information between the secret message and the modified 
message conditioned on knowing the secret key is upper bounded by the 
mutual information between the stego message and the modified message 
also conditioned on knowing the secret key, i.e., 

/(U;Y|K) </(X;Y|K). (5) 

(ii) The mutual information between the secret message and the modified 
message conditioned on knowing the secret key is upper bounded by the 
mutual information between the secret message and the stego message 
also conditioned on knowing the secret key, i.e., 



/(U;Y|K)</(U;X|K). 



(6) 
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(Hi) Knowledge of the cover message can be useful for extracting the secret 
message from the modified stego message when the secret key is known, 
i.e., 

7(U;Y|K,U)>I(U;Y|K). (7) 

Proof. The following identities can be easily derived from the definition of mutual 
information: 



7(U;Y|K) = 7(y,K;Y)-7(K;Y) (8) 

7(X;Y|K)=7(X;Y)-7(K;Y). (9) 

The Markov chain (4) implies I(V, K; Y) < /(X; Y) and claim (i) follows from 
(8) and (9). 

Note that V and K are statistically independent, hence, claim (ii) is equiva- 
lent to 7(U;Y,K) < I(V\ X, K). This inequality holds because there is a Markov 
chain V -> (X, K) -> (Y, K), which is obtained from (4). 

Using the fact that V, U and K are statistically independent one can rewrite 
the left side of claim (iii) as 

I(V- Y|K, U) = H(V |U, K) - H(V |U, K, Y) = H(V |K) - H(V |U, K, Y) 

> H(V |K) - H(V |K, Y) = I(V ; Y|K), 

where the inequality follows from the fact that additional conditioning on U 
decreases the entropy of V. 

It is important to note that the upper bound (6) on I(V; Y|K) is tight and, 
in general, much smaller than the trivial bound H(V). This means that, when 
extracting the watermark without knowledge of the cover image U, one cannot 
obtain the full information on the message V and, therefore, the message should 
be coded using a suitable error correcting code. 

When the cover image U is known at the stego decoder, the upper bound 
on the information about V that can be extracted from the modified stego mes- 
sage Y is H(V). The following theorem gives some general lower bounds for 
the robustness-related mutual information quantities in the case of real-valued 
images. 

Theorem 1 . Supposed that the cover image U = [Ui, . . . ,U n ], the stego im- 
age X = [Xi, . . . , X n ], and the modified stego image Y = [Yi,...,Y n \ have 
real-valued random components. If the channel constraint is based either on the 
distortion measure d 1 (x,y) = ^■J2i=i....,n( x i~yi) 2 or d 2 (x,y) = ma x^Xi-yi) 2 , 
then the following bounds hold: 

/(X; Y) > max{77(X),77(Y)} - ^\og 2 (2ne£) 

7(U;Y|U,K) > i7(Y|U,K) - f log 2 (2ne£) 



(ii) 
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Proof. We start the proof by stating two identities, which are easy to derive ((10) 
follows from the Markov property (4) and the fact that X is uniquely determined 



by V) U and K; (11) is analogous to (8)): 

7(Y;V,U,K)=I(Y;X) (10) 

J(Y;Y|U,K)=I(Y;X)-I(Y;U,K) (11) 

To show (i), we consider the following chain of inequalities, where the first 
step follows from the fact that conditioning does not increase entropy. 

/(X; Y) = H{ Y) - if(Y|X) = H(Y) - H{ Y - X|X) (12) 

> H(Y) - H(Y — X) (13) 

n 

>H{Y)-Y J H{Yi~X i ) (14) 

i= 1 

> H(Y) - ^ • E log 2 (2 TreElfr - X.f) 2 }) (15) 

i 

= H(Y) - i • log 2 ((27re) ra E\(\\ - W) 2 ]) (16) 

i 

> H(Y) - i • log 2 ((27re)"(E E[(Yi - Xi) 2 ]/n)") (17) 

i 

71 

> H{ Y) - - • log 2 (27ree) (18) 



Inequality (14) follows from H { Y — X) < Yli= i H(Yi — Xi) (see Chap. 9 in [8]). 
The entropy of the n components are maximized if they are Gaussian, in which 
case, H(Yi — Xf) < llog 2 (27 xeE[{Yi — Xf)' 2 ]) (cf. Chap. 9 in [8]). This implies 
(15). Inequality (17) follows from the fact that the geometric mean is upper 
bounded by the arithmetic mean. The last inequality is a consequence of the 
channel constraint (2) with respect to the distortion measure di(., .) or d, 2 (., .). 

The same argument can also be applied to /(X; Y) = -ff(X) — if(X|Y) and 
this implies that one can take max{iJ(X), H(Y)} in (i). 

The second statement (ii) follows from (11) and statement (i). 

In the case of binary (black-and-white) images, one can derive similar bounds 
on the robustness-related mutual information as above. The relevant distortion 
measure is based on the average Hamming distance given by ^djj (x, y), if the im- 
ages have n components. Since the encoder ■) has an encoder inverse, when 
given U and K, the secret message V uniquely determines the stego message 
X and, conversely, X uniquely determines V. Thus, H(V |U,K) = U(X|U,K). 
This implies I(V; Y|U,K) = H(V) — i?(X|U, K, Y) and the last entropy term 
equals H(Y — X|U,K, Y). A similar proof as above implies the following theo- 
rem. 



Theorem 2. If in a stego system the stego image X = [Xi, . . . , X n \ and the 
modified stego image Y = [Y \ , . . . , Y n ] have binary components satisfying the 
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channel constraint (X, Y)] < e < 1/2, then 

/(V; Y|K, U) > H(V) -nh(e), (19) 

where h{p) = — plog 2 (p) — (1 — p)log 2 (l — p) is the binary entropy function. 

3 Perfect Steganography 

A stego system is considered perfect if it satisfies the encoder constraint (1), 
the role of which is to ensure imperceptibility, and Shannon’s perfect secrecy 
condition I(V. X) = 0 [9]. Thus, it is not astonishing that one obtains a perfect 
stego system if the secret message is first encrypted using the Vernam cipher 
(or one-time pad) and then embedded using any embedding function. A jus- 
tified criticism of the simple Vernam stego system is that it does not provide 
robustness. For instance, an additive white Gaussian noise attack on the stego 
message is likely to lead to decoding errors. This is the motivation to consider 
a generalization of the Vernam cipher for which the robustness can be adapted 
and increased. 

3.1 Permutation Modulation 

The proposed perfect stego scheme is based on an encoding technique, which 
is called permutation modulation and which was introduced by Slepian [10]. 
The cover message is modelled as a finite sequence of real random variables 
U = [Uo, U\, , U n - 1 ]. The secret key K = [K 0l K\, . . . , Af ra _i] is a vector of 
statistically independent, identically distributed (i.i.d.) real random variables. 
The secret message is a discrete random variable V, which selects a permutation 
of n letters. 

The perfect stego encoder is given by the following encoding rule 

X = U + K v (20) 

where K v = [K Vo , K Vl , . . . , K Vn _ 1 ] denotes the vector K with components per- 
muted according to the permutation V , which is given by (i>o, v ±, . . . , v n -i). 

Proposition 2. Suppose that the secret message V, the cover message U and 
the secret key K are jointly statistically independent. Then, 

(i) I(V ; X|U) = 0 

(ii) J(V;X) = 0. 

Proof. Claim (i): Since V and U are statistically independent, /(V;X|U) = 
/(V;X, U). Since K' is determined by X and U, and since by hypothesis U is 
statistically independent of the pair (V,K 1- ), one obtains 

i(y ; X, U) = H(V) — i?(V|X, U) 

= H(V)-H(y |K V ,U) 

= H(V) ~H(V\K v ). 
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It remains to show that H(V) — H(V |K V ) = 0 or, equivalently, that V and K l 
are statistically independent. 

Let p(.|.) denote the conditional probability density function (pdf) of K v 
given V (a similar argument goes through for discrete random variables, where 
the pdf is replaced by a conditional probability distribution). Since all compo- 
nents of K are i.i.d., the conditional pdf does not depend on the permutation 
V = v, i.e. , 

p(k Vo ,k Vl ,. . .,k Vn _ Jr;) = p(k 0 )p(k 1 ) ■ ■ -p{k n - 1). (21) 

where p(.) denotes the pdf of the components Ki. Hence, K x and V are statis- 
tically independent. 

Claim (ii): /(V;X) = 0 follows from claim (i), using i7(V|X,U) < H(V |X). 

In communications, one considers additive noise channels for applying per- 
mutation modulation. This situation can also be modelled in the stego context 
where the intended recipient, Bob, has access to the cover message and where 
the crook, Eve, is only allowed to distort the stego message by additive noise. In 
this case, Bob can form Y — U = K 1 + Z, where Z denotes the additive noise 
introduced by Eve. 

If the i.i.d. components of the key K are symmetrical random variables, i.e., 
p{ki) = p(—ki), then the encoder can be extended to include sign changes. The 
resulting scheme is again a perfect stego scheme. If one uses no permutations 
but only sign changes and a secret key K of coin-tossing ±1 valued components, 
one obtains the Vernam cipher with ail embedding. 

Example 1. Suppose that in a mini stego system, the secret key K consists of 8 
components, which are ±1 valued. Choose a sample vector k with 4 components 
having value 1 (and 4 components having value —1). One obtains a perfect stego 
system because (21) is satisfied. By permuting the components of k, a total 

of ( 4 ) = 70 different vectors are produced, i.e., 70 different messages can be 
embedded. Compared to the Vernam stego system of length 8, the rate has been 
reduced to (log 2 70)/8 bit per component. In exchange, the minimum squared 
Euclidean distance between distinct embedded codewords is 2 2 + 2 2 , which is 
twice the squared minimum distance of the Vernam scheme. This illustrates how 
the information rate can be traded for increased robustness. 

3.2 Cyclic Shift Modulation 

A low rate modulation scheme is now introduced which is based on maximum- 
length sequences (m-sequences). The resulting scheme has robustness properties 
similar to watermarking schemes that rely on spread-spectrum sequences [4] . The 
detection process makes use of the good correlation properties of m-sequences. 

Let s = [so, si, . . . , s„_i] be a ±1 valued m-sequence of length n (n = 2 m — 1) 
and let k = [fco, ki, . . . , k n - 1 ] be the secret key with i.i.d. and symmetrically dis- 
tributed components. The code set for the embedding is constructed as follows: 



C = {Sh J (s) © k : j = 0, 1, . . . ,n - 1} U {k}, 



( 22 ) 
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where Sh-'(s) = [sy, .Sj+i . . . . , Sj+ n _i] is the cyclic rightward shift by j positions 
of s (the indices are reduced modulo n) and © denotes componentwise multipli- 
cation of two vectors. There are M = n + 1 = 2 m codewords in C; hence, one 
can encode m bits using this code of length n = 2 m — 1. An additional bit b can 
be encoded by selecting the polarity (±1) of each codeword. The embedding (or 
encoding) of a secret message v £ {0,l,...,2 m — 1} within a cover message u 
can be defined by 

x = u + (— l) b • c (23) 

where c = Sh 1 '(s) © k if v 2 m — 1 and c = k if v = 2 m — 1. The stego scheme 
based on (23) will be called a Cyclic Shift Modulation (CSM) scheme. In a CSM 
scheme the m-sequence s need not be kept secret; the only secret parameter that 
has to be passed over a secure channel to the intended receiver is the secret key. 

Noting that the conditional probability density of X — U given V satisfies 
p(Sh J (s)©k|r>) = p(k), a proof similar to that of Proposition 2 gives the following 

Theorem 3. Cyclic shift modulation provides a perfect stego system, i.e., 

/(P;X|U) = 0 = J(y;X). 



The decoding at the receiver is based on correlation. Recovering the secret 
message v from the received message y is equivalent to finding the corresponding 
codeword that was embedded. For each code sequence c, the decoder forms the 
scalar product 

n— 1 

< y> c >= yo c o 

3 = 0 

and chooses as the decoder output c (one of) the codeword(s) that maximize(s) 
the magnitude of this scalar product. The additional bit b , which selects the 
polarity of the codeword, can be recovered from the sign of < y, c >. 

If the secret key K consists of random ±1 valued i.i.d. components, then 
the resulting code C has optimal cross correlation properties, i.e., Welch’s lower 
bound on the maximum value c^ loa , of (< c, c' >) 2 for c yf c' is achieved, which 
states that c^ oa , > n{M — n)/(M — 1) (cf. [1 ]). This optimality result follows 
from the correlation property of the code as stated in the following theorem. 

Theorem 4. If the secret key K consists of random ±1 valued i.i.d. components, 
then the code defined by (22) has the cross correlation values 



( n if c — c' 
1^—1 otherwise 



where n denotes the code sequence length. 



(24) 



Proof. The case c = c' is evident. 

Suppose c / c' and consider the case, where the code sequences are of the 
form S1t 7 (s) © k and Sh b (s) 0 k, respectively. For notational convenience for 
the indexing, we consider SIP (s) to be a length-n subsequence of the infinite 
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periodically continued m-sequence s. Using the fact that the key components 
satisfy k'f = 1, one has 



where the last equality follows from the correlation property of m-sequences. 
Finally, the case where one of the two code sequences equals k can be proved 
similarly. 

For general secret keys, without the ±1 restriction on the components Ki , 
the cross correlation values will be scaled by the variance cr 2 = Var(Ki) and 
(24) no longer holds. However, one expects that the peak-to-off-peak ratio will 
be roughly the same as for the m-sequence because, when taking the average 
over all secret keys, one obtains 



Remark 1. One might also expect that a standard synchronous CDMA scheme 
based on the code C given by (22) will provide a perfect stego system. However, 
this is not the case because standard encoding by selecting the parity of each 
spreading sequence c j according to the bits of the secret message does not yield 
a resulting sum vector x = 1)^' • c j that is statistically independent of 

the secret message [i>o, Vi, . . . , v n }. Thus, one does not obtain the desired result 
I(V ; X) = 0. 

4 The Robustness of Some Simple Schemes 

For the evaluation of stego and watermarking schemes on the basis of information 
theory and, in particular, using criteria (i) and (ii) in Section 2.2, one needs a 
stochastic model of the message source. In this section, we either assume that 
the cover image is Gaussian or that the cover image has an arbitrary real-valued 
probability distribution while the stego channel satisfies certain orthogonality 
conditions. Using this simplistic model, one can find tight bounds on I(V; Y) for 
various scenarios and make some interesting conclusions. In particular, some of 
the experimental results presented in [12] can be given a theoretical explanation. 

The main focus of this section is the robustness of the embedding of the secret 
message V. For simplicity, we assume that the secret message V is identical with 
the embedded vector V (that is, no encryption and no error-correcting scheme is 
used). Thus, we assume that V is a vector of i.i.d. zero-mean Gaussian random 
variables and that the stego encoder is a sum encoder, i.e., 



n—1 




i = 0 
n—1 



— 'y ' Sj+iSh - B 



i=0 

= -1 




X = U + V. 



(25) 
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In the remaining part of this section, we will make the simplifying assump- 
tion that the cover message U has independent components and, hence, also 
X has independent components. This assumption can be approximately met by 
choosing a sparse sub-image, i.e., by selecting a suitable subset , t/,: 2 , . . . , Ui , 
of the components of U. The setting is further simplified by considering each 
component individually, i.e., by studying random variables instead of random 
vectors. 



V 



z 



u 



© 



X 



© 



Y 



Fig. 2. Sum Encoder and Additive Noise Stego Channel 



Example 2. (Additive Noise Attack) The attacker is only allowed to distort 
the stego message X with additive white Gaussian noise (AWGN) Z\ thus, the 
distorted message is given by Y = X + Z. Fig. 2 shows the block diagram of 
the corresponding system. The robustness-related mutual information is given 
by the well-known formula (cf. Chap. 10.1 in [8]) 

II - Y ' V > = | 1 °g»(l + VarlU)’+Var{Z) ' 1 ' < 26) 



This formula implies I(Y\V ) >0, i.e., an AWGN attack can never completely 
remove the information about the secret message V. Note that for a given vari- 
ance VarZ, the Gaussian distribution is the worst stego-message independent 
attack [8]. If the cover message U is known at the receiver, then one has to con- 
sider the conditional information, which is the capacity formula for a Gaussian 
channel (cf. Chap. 10 in [8]) 



I(Y;V\U) 




Var{V) 
Var(Z) ’ 



Since the mutual information I(V;Y) remains invariant, when Y is normal- 
ized to have zero mean, we can without loss of essential generality assume that Y 
has zero mean, which will be done within the remaining part of this paper. 

In a general attack, the distortion introduced by the attacker is 



Z = Y — X. 



(27) 



By normalizing the cover image U, we can assume that U is zero-mean. Thus, we 
have that X , Y, and Z are zero mean and the channel constraint E[{Y —X) 2 ] < e 
implies Var(Z) < e. 
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The attacker can choose the distortion Z to depend on X, which generalizes 
the AWGN attack. The distortion Z can be split into the projection Z * onto 
the space spanned by the random variable X, which minimizes the expectation 
E[(Z — ■Z' 71 ') 2 ], and the orthogonal complement Z 1 - (cf. Chap. 11 in [13]). Thus, 
one has the orthogonal decomposition 

Z = p-X + Z ± , 



where p = E[ZX\/E[X 2 ]. 

Since X is the sum of the independent random variables U and V, the random 
variable X 2 - = (VarV)U — (VarU)V is orthogonal to X. The pairs of random 
variables (U,V) and (X, X^) are related by an invertible linear transformation 
and, thus, the Markov chain (4) can be rewritten as (X, X^) — > X — » Y — X, 
yielding the Markov chain 

X- 1 -> X -> Z. (28) 

If U and V are zero-mean Gaussian then X and X 1 - are also zero-mean Gaus- 
sian. Being orthogonal, X and X 1 - are statistically independent. By (28), X 1 - 
and Z are also statistically independent. Thus, also Z 1 - = Z—p X is independent 
of X 1 - and, by construction, it is orthogonal to X. Therefore, Z 1 - being orthog- 
onal on X and X x , it is orthogonal on U and V. These results are summarized 
in the following lemma. 

Lemma 1 . If U and V are statistically independent, zero-mean Gaussian ran- 
dom variables, then 

E[Z^U] = 0 and E[Z^V] = 0. (29) 

Remark 2. The equations (29) are crucial to derive the lower bound given in 
the next theorem. Note that U need not be necessarily Gaussian for (29) to 
hold. For non-Gaussian U, it is enough e.g. that the attacker’s distortion is 
restricted to be of the form Z = p'X + Z^ , where p’ denotes any real number 
and Z 1 - is independent of X. For the non-Gaussian case, (29) can be viewed as 
an orthogonality condition on the stego channel. 

The first two steps in the derivation of the lower bound rely on the fact that 
conditioning reduces entropy and the Gaussian distribution maximizes entropy 
for a given variance, i.e. , for any real number p, one has 

I(V ; Y) = H(V) — H(V\Y) 

= H(V) — H(V — pY\Y) 



> H(V) — H(V — pY) 


(30) 


>H(V)~ ^log 2 (27 xeVar(V - pY)). 


(31) 



Since all involved random variables are zero-mean, (29) implies that 
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Var(V - pY) = E[{p(X + p-X + Z 2 -) - V) 2 } 

= E[(p{l + p)U + (p(l + p) — 1)V + pZ 2 -) 2 ] 

= p 2 { 1 + p) 2 VarU + (p( 1 + p) — l) 2 RarR + p 2 VarZ ± 
= p 2 [{ 1 + p) 2 VarX + VarZ -1- ] + (1 — 2/x(l + p))V cirV. 

This variance is minimized by setting p = £’[VF]/E , [y 2 ], which yields 



Var(V 



E[VY] , 
E[Y 2 } ’ 



( VarV ) 1 - 



VarV 

VnrX 4 - VarZ± - 
v -i- ( 1+p )2 



and gives the lower bound 



/(y;F)> -log 2 



1 - 



VarV 



V ™ f +^- 



(32) 



The attacker wants to minimize I(V;Y). Without any channel constraint, 
the lower bound is minimized by choosing p = — 1, which actually implies 
I(V\Y) = 0. Choosing p = —1 means that Y consists only of the distortion 
term Z 1 ', which in general is not allowed because of the channel constraint 

p 2 VarX + VarZ 2 - = VarZ < e. (33) 

The lower bound (32) is minimized by maximizing VarZ 2 - /(1 + p) 2 . When taking 
the constraint (33) into account, this is a constraint maximization problem. It 
is easily solved by using (33) to eliminate the term VarZ 2 - = e — p 2 V arX and 
then finding the maximum of VarZ 2 - /{ 1 + p) 2 = (e — p 2 VarX ) / ( 1 + p) 2 with 
respect to p. The maximum is achieved for p = — e/VarX and the lower bound 
on the mutual information becomes 



I(Y;Y) > -log 2 



1 - 



VarV 

( 1 +e/(VarX-e))VarX , 



When e = 0, we obtain the lower bound 



(34) 



I{V-X) > -log 2 



1 - 



VarV 

VarX 



which actually holds with equality since (30) and (31) hold with equality because 
the random variables in question are independent and Gaussian. Recalling (6), 
this proves the following theorem. 
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Theorem 5. Suppose that the distortion Z = Y — X of the stego message X 
satisfies the channel constraint E{Z 2 ) < e and that the secret message V is zero- 
mean Gaussian. If the cover image component U is Gaussian, then the mutual 
information between the secret message V and the stego image Y is bounded by 

k lQ g2 1 I VarV ~ 1 <I(V-,Y)< hogj— 

\ 1 (1 +e/(VarX-e))VarX J V 1 VarX 

Remark 3. The lower bound also holds in the case, where U is not necessarily 
Gaussian but the stego channel satisfies the orthogonality constraint (29). For 
non-Gaussian U, the upper bound no longer holds. When there is no attack, i.e. , 
e = 0, one has the the lower bound 1/2 • log 2 (l/(l — VarV /VarX)) < I(V;X). 

The derivation above gives a characterization of the worst case attack, viz., 
produce a distortion Z = pX + Z 1 - with p = —e/VarX and where Z 1 - is zero- 
mean Gaussian with variance e(l — e/VarX) and statistically independent of X. 
This attack achieves the lower bound. 

Theorem 5 can be used to give the following robustness criteria of a stego 
scheme in the Gaussian case: 

— One should choose Var(V) as large as allowed by the encoder constraint, 
i.e., Var(V) = S. 

— If VarX and, hence E[X 2 ] = E[U 2 ) + E[V 2 }, is large compared to the 
channel constraint e, then I(V;Y ) ss I(V\ X). Thus, when having the choice 
to embed the watermark at particular locations within the cover image, one 
should choose those components for embedding that have large E(U 2 ) (i.e., 
high dynamic range). 

These two design criteria for a robust stego scheme are in accordance with the 
findings in [12], where the most robust stego schemes resulted from embedding 
the watermark with maximal allowed strength at locations of edges (with high 
dynamic range) in the image. 

5 Conclusions 

A model for a stego system was presented, which gives a novel characterization 
of the two critical components, the embedding process and the attacker’s mod- 
ification of a stego message. The definition of the two components is based on 
a requirement on the maximum distortion between the cover message, the stego 
message and the modified stego message. This model leads to an information- 
theoretic approach to steganography, which allows one to describe the two basic 
issues, secrecy and robustness, in terms of mutual information. In particular, 
Shannon’s definition of perfect secrecy can be readily extended to steganogra- 
phy- 

Based on permutation modulation and on maximum- length sequences, two 
classes of perfect stego schemes were proposed. The robustness of these schemes 
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can be adapted by decreasing the rate of the hidden secret information which, 
in turn, increases the robustness. 

For some simplistic schemes, in particular, for cover images with statistically 
independent Gaussian components, tight bounds have been established for the 
robustness-related mutual information. These bounds provide a theoretical basis 
for some design criteria that have been derived from experiments in [12]. 
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Abstract. In this paper we first generalize the steganography system 
model which Christian Cachin proposed, and specialize it to be suit- 
able for computer oriented steganography systems. Based on this model, 
we introduce a new perfectly secure steganography scheme, one-time 
hash steganography, with which one can hide a secret bit into any cover- 
data that satisfies certain condition (partial recomputability). Finally 
we prove that there exists a perfectly secure steganography system with 
given cover-data source if and only if the cover-data source is partially 
recomputable to its sender. 



1 Introduction 

Steganography is the art and science of hiding data into innocent-looking cover- 
data so that no one can detect the very existence of the hidden data [2,4]. 
It is somewhat different from cryptography, since the goal of steganography is 
undetectability, not secrecy only. For example, a ciphertext may contain peculiar 
words like “QJYZQDFLKJ,” but a stego-text (data-embedded text file) should 
be read as an ordinary text file so as not to draw suspicion of secret message. 

Steganography itself has long history [ 0], and recent proliferation of digi- 
tal communication increased the importance of computer based steganography, 
with which one can hide the digital communication channel itself. For example, 
although e-mail encryption program can secure the contents of a mail, it cannot 
hide the very fact of mail delivery by itself. If you send an encrypted mail to 
your girlfriend in hostile country, you might be arrested as a suspected spy un- 
less you enable the policemen to read the contents. With a secure steganography 
system that hides the mail transfer protocol into another protocol, you can send 
encrypted mails without the fear of drawing such suspicion. 

This type of application of steganography systems is more serious in the areas 
where the use of cryptography is limited by law. Some countries ban unlimited 
use of strong cryptography, and retain the right of court-authorized wiretap- 
ping. Some say that enabling court-authorized wiretapping cost-effectively pre- 
vents criminal activities [8]. Some say that it costs much and can make total 
system vulnerable [1]. The importance of steganography appears here. If the 
people whose communications are wiretapped could use a secure steganography 
system, the effect of court-authorized wiretapping would decrease. If the secure 
steganography system were easy to use and can provide enough bandwidth, the 
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effect of wiretapping would be little. The hidden communication with steganog- 
raphy systems could be banned, but it would be very hard (or impossible) to 
detect and sue it. Most criminals would use the secure steganography system 
for criminal activities and only the criminal communications would be private. 
Therefore, when and how a steganography system can be secure is essential to 
the measurement of the cost-effectiveness of cryptography restriction policies 
such as key escrow policy. 

This paper answers both of these questions. Specifically, this paper gives 
both a necessary and sufficient condition for the existence of a steganography 
system that is perfectly secure from passive adversaries and a constructive proof 
of the sufficiency of the condition. To do this, we first construct a new model for 
steganography system, which is based on the one Cachin introduced in [6]. Next, 
we introduce the notion of partial recomputability. Roughly speaking, a partially 
recomputable random variable is a random variable the realization of which can 
be tweaked without distorting its distribution. Then, we introduce a perfectly 
secure steganography system, one-time hash steganography system. This system 
can take arbitrary message source as cover-data, provided the message source is 
partially recomputable. Finally, we show that if there exists a perfectly secure 
steganography system, its cover-data is always partially recomputable. This leads 
to the conclusion that there exists a perfectly secure steganography system that 
takes given message source as cover-data, if and only if the message source is 
partially recomputable. 

The paper is organized as follows. Section 2 contains basic definitions. Sec- 
tion 3 briefly describes related works. Section 4 describes our model of steganog- 
raphy system. Section 5 introduces one-time hash steganography, the main con- 
tribution of this paper. Section 6 discusses the condition for the existence of 
perfectly secure steganography system. Section 7 contains conclusions. 

2 Terms and Definitions 

2.1 Preliminary Definitions 

Uppercase variables like S, C or E denote random variables and uppercase bold 
variables like M, K or E denote finite sets of symbols except otherwise specified. 
Z denotes the set of all integers. Zf a ^ b j denotes {x £ Z | a < x < b}. |M| denotes 
number of elements in M. X € M means that any realization of X is one of M. 
X = Y means that the distribution of X is equivalent to that of Y . H(X), 
H(X\Y) and I(X;Y ) denote the entropy of X, the conditional entropy of X 
conditioned on Y and the mutual information between X and Y, respectively. 



2.2 Basic Model of a Steganography System 

Fig. 1 shows the generally accepted model of a steganography system (or stego- 
system for short), based upon the agreement made at the First International 
Workshop on Information Hiding [12]. The sender of a secret message embeds 



One-Time Hash Steganography 



19 



embedded-data into cover-data using a key , and sends the result, stego-data , to 
the recipient. The recipient then extracts the embedded-data from stego-data 
using a key that may or may not equal to the one used in embedding. 



Key 



Key 



Cover-data 




Embedded-data Embedded-data 

Fig. 1 . Basic model of a stegosystem 



There can be two kinds of attacks against a stegosystem [3]. One is passive 
attack , to detect (and possibly prove to a third person) the existence of a secret 
message embedded in stego-data. An attacker who does this kind of attack is 
called passive adversary. The other is active attack , to modify the stego-data 
slightly in order to destruct the embedded-data. Attacker of this kind is called 
active adversary. 

In this paper, we consider only passive type attacks, and the robustness of 
the embedded message is out of scope. 

3 Related Works 

Cachin made an information-theoretic model of a stegosystem and introduced 
the notion of “perfectly secure” as a special case of “e-secure” [6] . A stegosystem 
is e-secure against passive adversaries if the relative entropy between probability 
distribution of cover-data and that of stego-data is less than or equal to e. He 
call a 0-secure stegosystem perfectly secure, in which case both distributions are 
identical. 

Zollner et al. analyzed their stegosystem model with information theoretic 
approach and got several conditions for a stegosystem to be perfectly secure [15]. 
In particular, they proved that the cover-data of a perfectly secure stegosystem 
should not be known to passive adversaries. 

In order to send a secret bit, one can generate many cover-data candidates 
and select one candidate that has the secret bit as its keyed hash value, and 
send the candidate as a real cover-data [3,5]. This “hash and choose” type of 
steganography is called selection method in [5], and described as tantamount to 
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one-time pad in encryption. The one-time hash steganography we describe later 
in section 5 is a provably secure version of this selection method. 



4 Generalizing the Basic Model 

As stated in section 1, our primary goal in this paper is to answer the questions: 
when and how one can construct a perfectly secure stegosystem. In order to treat 
this topic in its most general form, we generalize the basic model for stegosystem 
shown in Fig.l as follows. 

— Sender may use environmental input other than cover-data and embedded- 
data. 

• For example, some information about the mail writer is indispensable 
to generate or tweak the contents of a mail algorithmically without in- 
troducing unnaturalness. Stegosystems that employ selection method re- 
quires multiple cover-data candidates, which can also be regarded as 
additional environmental input to embedding algorithm. 

— The message recipient extracts need not be exactly equal to the original 
message. In other words, stegosystem can make error in our model. 



V 

E 

C 



K K 




Passive Adversary 



E’ 



Fig. 2. Our model of a stegosystem 



Fig. 2 shows our model of a stegosystem. In this model, a stegosystem consists 
of an embedder £ , an extractor T) and several random variables that form inputs 
and outputs of £ and V. 

We denote the embedding process as 



S = £ (E, C, K, V) 



(1) 
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where E £ E denotes embedded-data, C € M denotes cover-data, V £ V 
denotes environmental-data, K £ K denotes shared secret key, and S £ M 
denotes stego-data. Similarly, we denote the extracting process as 

E' = V(S, K) (2) 

where E' S E denotes an approximation of original E. 

The goal of a passive adversary is to detect whether the sender is using a 
stegosystem or not by eavesdropping a message (S or C) [6]. The message the 
passive adversary eavesdrops is a realization of either S (in the former case) or C 
(in the latter case). The more the distribution of S gets similar to that of C, 
the harder for the passive adversary to detect the use of stegosystem. In the 
extreme case, both distributions become identical and eavesdropping becomes 
meaningless. We call this type of stegosystem perfectly secure. It is impossible 
for any passive adversary of any computational power to detect the usage of 
perfectly secure stegosystem. 

The model we described above is fairly general. For example, under the con- 
ditions of H(V) = 0 and E' = E , the above model is equivalent to the basic 
model. Under the conditions of V = R (here R denotes a random number source 
private to the sender) and E = E' and some assumptions on conditional en- 
tropies, the above model is equivalent to the one introduced in [6]. It is easy to 
apply above model in the midst of protocol by defining V to contain the log of 
all the symbols so far transmitted. 

Now we make some additional assumptions to make the model suitable for 
computer based stegosystems and to exclude some absurd situations. 

— Both embedder and extractor should be computable algorithms. 

— Sender can use private random number generator. 

— All the specifications of stegosystem (such as the algorithms of key generator, 
embedder and extractor) are known to adversaries. 

• In other words, the security of a stegosystem does not depend on the 
secrecy of its algorithms. All shared secret resides in key. 

— E' provides some information about E. In short, 

I{E'- E)> 0 (3) 

• This implicitly excludes the case of H(E) = 0. 

— The secret key I\ should be generated and shared between the sender and the 
recipient securely prior to the transmission of stego-data. (The key sharing 
method is out of the scope of this paper.) 

— The value of S depends upon K. In short, 

H(S\CEV) > 0 (4) 

• If the computation of S is independent from K, K has nothing to do for 
extracting. In such case, E' depends upon S only. Since we assume that 
the algorithms for extraction and key generation are publicly known, 
any passive adversary can compute their own E\ which has the same 
distribution of the recipient’s E ' . Therefore, all information that the 
recipient gets in a stego-data is available to all passive adversaries. We 
exclude this situation from our model. 
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5 One-Time Hash Steganography 

5.1 Partial Recomputability 

Let us consider here when one can construct a perfectly secure stegosystem. 

At least the cover-data C (the message to be sent in non-steganographic 
communication) should be indeterminable to any passive adversary, since if C is 
determinable to a passive adversary, S of any perfectly secure stegosystem is also 
determinable to him and the sender cannot alter S to contain any information 
about E and K. In order to hide data into the indeterminable part of the cover- 
data, it is desirable for the sender to know what causes the indeterminability. 
For example, if C = /(i?,A) holds for certain random variables R,X and a 
function / and any adversary can not get any information about R and the 
sender knows something about R, it is likely that the sender can modify or 
regenerate R and recompute f(R,X) again and again to use selection method 
(see section 3). 

Based on above consideration, we define the following condition. 

Definition 1. A random variable C is partially recomputable to an entity P 
if and only if there exist random variables X, R\ 1 R 2 , ■ ■ ■ ,R Hr (n r > 2) and a 
function f such that for all i £ Zm ,n r l 

f(C, Ri,X) = C (5) 

H{f(C,Ri,X)\CX)>Q (6) 

hold and P can get all realizations of X, f?i, R 2 , ■ ■ ■ , R Ur and R\ 1 R 2 , ■ ■ ■ , R Ur are 
mutually independent random variables private to P. 

The crucial point of partial recomputability is that if a random variable is 
partially recomputable to an entity P, P can replace the realization of the random 
variable in probabilistic way without distorting the distribution of the random 
variable. 

Following are some examples of partially recomputable random variables. 
(Here we always assume that the one who tries to recompute the random vari- 
ables has private random number generator). 

Example 1. Any distribution- known random variable C £ M is partially re- 
computable if H{C) > 0 holds, since one can mimic the random variable with 
his private random number generator (see [7] for details) and the condition 
H(C ) > 0 guarantees that the output of the random number generator affect 
the recomputed value of C (i.e., equation (6) holds). 

Example 2. Any random variable C £ M is partially recomputable if there exist 
two symbols a, b £ M such that both probabilities Pr(C = a) and Pr(C = b) 
are known and positive. One can use following function to recompute C. (We 
assume here that p = Pr(C = a) < q = Pr(C = b). R £ {0,1} is a random 
variable such that Pr(f? = 1) = p/q holds, v is a dummy input). 

(c (ifc^{a,b» 

/(c, r, v) = < b (if c = a V (c = b A r = 0)) (7) 

[a (if c = b A r = 1) 
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Example 3. Any digital signature that requires a random number to create is 
partially recomputable to its signer. To recompute a signature, the signer have 
only to compute his signature again with new random number, which may or 
may not equal to the one used in the computation of the original signature. 

5.2 Definition of One-Time Hash Stegosystem 

Based on the above definition, here we introduce one-time hash stegosystem 
(or OTH stegosystem for short), which can hide an embedded-data bit E £ 
E = {0,1} of any distribution into any cover-data C £ M that is partially 
recomputable to its sender. The principal idea of OTH stegosystem is fairly 
simple: the sender generates some candidates for the cover-data, hashes each 
of them by shared one-time hash, chooses one candidate according to the hash 
values and the secret bit to transmit, sends it to the recipient. The recipient 
applies shared one-time hash, and gets an approximation of the original secret 
bit. 

The key K of OTH stegosystem should be uniformly random in {0, 1 } M . 
Note that the key should never be reused. 

Following is the embedding algorithm of OTH stegosystem written in Pascal- 
like pseudo language. Here we assume that cover-data C is partially recom- 
putable as C = f(C, Ri,X) ( i £ Z[ 1?lr ]). The numbering function g is an injec- 
tion from M to Z[ 1; | M |] and shared between embedder and extractor, n G Z[ 2 ,n r ] 
is the number of recomputation and can be chosen arbitrarily. (In general, in- 
creasing this value makes the embedding process take longer and the error rate 
lower) . Both n and g are part of the algorithm and can be public. 

Input 

embedded-data e £ {0,1} 

cover-data c € M 

key fc = &i &2 • • • &|M| G {0, 1} |M| 

environmental inputs: 

samples of recomputable part 

(realizations of R \ , R 2 , • • • , R n ) ri ’ r 2 ’ > r ™ 

environmental input for recomputation 
(realization of A) 

Output 

stego-data s £ M 

Embedding Steps 



1: count-o '■= 0; count i := 0 
2: for i := 1 to n do 

3: ai~ f(c,ri,x) 

4: if (& 9 ( ai ) = 0) then 

5: pool[ 0, counto \ := a*; county := county + 1 

6: else 
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7: pool[ 1, count i] := ap, counti := count\ + 1 

8: endif 

9: endfor 

10: if county < counti then greater-half := 1 else greater- half := 0 endif 
11 : count m := min(coim£o , counti) 

12: Choose a random number r G Z[ 0jra _i] 

{ poo£[0, Lr/2J] (if r < 2count m i n A e = 0) 

pool) 1, [_r/2j ] (if r < 2coim£ m i n A e = 1) 

pool[greater-half ,i — count m i n ] (if 2coim£ m i n < r ) 



The extracting algorithm of OTH stegosystem is as follows. 

Input 

key k = £>i& 2 ---6|m| e {0,l} iM| 

stego-data s G M 

Output 

extracted data e' G {0,1} 



Extracting Step 



1: e & s ( s ) 



5.3 Security of One-Time Hash Stegosystem 

Theorem 1. One-time hash stegosystem is perfectly secure. 

Proof. At the entrance of step 13 in embedding steps, variables poo£[0,0], 
pool) 0, 1 ], • • • , pool) 0, county — 1 ], pool) 1 , 0], pool[ 1 , 1 ], • • • , pool[ 1 , counti — 1 ] are 
permuted realizations of f(C, Ri,X) ( i G Z[ 1)Tl ]). Let us consider f(C, Rt, X) for 
certain i G Z[ 1)n j. The realization of f(C,Ri,X) must place somewhere in pool 
at the entrance of step 13. Let pooZ[j, k\ denote the place. The probability that 
the realization of f(C,Ri,X) is selected as stego-data is, if k < count m 

Pr(e = j A k= L^J) = Pr ( e = b g(f(C,R^x))) x Pr(fc = }^J) 

_ 1 2 
2 n 
1 

n 



( 8 ) 

(9) 

( 10 ) 



otherwise 



1 



Pr(r = count m i n + k) 



n 



( 11 ) 
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In short, this probability is 1/n regardless of the position. Consequently, 

n i 

Vm e M Pr (S = m) = V - Pr (f(C, R u X) = m) 

' n 
2—1 

= E^ Pr (^ = -) 

2=1 

= Pr(C = m) 



(12) 

(13) 

(14) 



holds (from the partial recomputability condition). This means S = C : the 
perfect security. 

Note that if the definition of a cover-data implies partial recomputability (i.e. 
equation (5) and (6) hold regardless of any external condition), OTH stegosystem 
that takes the cover-data is perfectly secure to any passive adversary regardless 
of the knowledge of the adversary, provided that realizations of i?,; , K and the 
outputs from the sender’s random number generator are kept secret. 



5.4 Bandwidth of One-Time Hash Stegosystem 

Though OTH stegosystem can make error and E' is not always equal to E, it 
contains some information about E. 

Theorem 2. On one-time hash stegosystem , I(E'E') > 0 always holds. 

Proof. The error E' ^ E occurs if and only if 2count min < r A greater- half ^ E 
holds at step 13 of embedding steps. Therefore, the error probability is 



Pr (E' E) = Pr(2cownt m i n < r A greater-half ^ E) 
n - 2 count min p ^ greater _ half ^ E) 



n 

COUTlt min 



(15) 

(16) 

(17) 



(If 2count ra i n < n holds, Pr {greater- half ^ E) is always 1/2, since the key 
bits 6162 • • • 6 im| are randomly generated and independent from E. Otherwise, 
Pr (E r ^ E) is 0 and the above equation holds anyway). From the conditions 
of partial recomputability, H(f(C, Ri,X)\CX) > 0 holds for all i £ ^[i,n] an d 
all random variables Ri ( i £ Z[ l n j) are mutually independent. This means that 
Pr(pooZ[0, 0] = pool[ 0, 1] = • • • = pool[ 0, count 0 — 1] = pool[ 1, 0] = pool[ 1, 1] = • • • 
= pool[ 1, count\ — 1]) < 1. From this and the fact that the key bits &1&2 • ■ • 6 im| f° r 
hashing is randomly chosen, Pr(coMnf m i n = 0) < 1 holds. This means Pr(£’' 7^ 
E) < 1/2 and Pr^' = E) = 1 — Pr^' ^ E) > 1/2. Therefore, E' is not 
independent from E and I(E\ E ') > 0 holds. 



This means that equation (3) in our assumption always holds. It is easy to see 
that OTH stegosystem conforms to the other assumptions we made in section 4. 
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The above theorem guarantees that the channel capacity of the transmission 
channel OTH stegosystem provides is always positive. Therefore, the sender can 
encode his message with error-correcting code and send the result using the OTH 
stegosystem multiple times so as to lower the error probability. 

Since any digital signature scheme that uses a random number is partially 
recomputable to its signer (see Example 3), OTH stegosystem can take any such 
digital signature as cover-data. This can be regarded as a subliminal channel [14]. 
Though the bandwidth of this channel is fairly narrow compared to the band- 
widths of those cleverly exploits the redundancy of each signature scheme [13], 
this channel is perfectly secure even if the underlying signature scheme is vul- 
nerable to the adversaries or if the signature function is not an injection from 
the random number input to the signature. 



6 Necessary and Sufficient Condition for the Existence of 
Perfectly Secure Stegosystem 

In the previous section, we showed that the partial recomputability of cover-data 
is a sufficient condition for the existence of perfectly secure stegosystem. In this 
section, we consider the necessary condition for the existence of perfectly secure 
stegosystem. 

Suppose that there exists a perfectly secure stegosystem. Then, 

C=S (18) 

= £(E, C, K, V) (19) 

holds. Let n r be arbitrary integer bigger than one. Defining / as 

f(C, K, (. E , V)) = £(E, C, K, V) (20) 

the first condition for the partial recomputability (equation (5)) is fulfilled, since 

f(C, Ri, (E, V)) = C (21) 

holds for all i £ Z[ l nr j. (Here Ri (i £ Z[ l jlr ]) denote the key candidates gener- 
ated using the sender’s private random number generator and Ri = K holds for 
all i £ Z[ l nr ]). From equation (4), 

H(f(C, Ri, (E, V))\CEV) = H(S\CEV) > 0 

holds. Therefore, C is partially recomputable to the sender. 

From the above fact and the existence of OTH steganography, the next the- 
orem holds. 

Theorem 3. For any message source C , there exists a perfectly secure stegosys- 
tem that takes C as its cover-data, if and only if C is partially recomputable to 
its sender. 
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Note that partial recomputability is not the necessary condition for the ex- 
istence of the perfectly secure stegosystems we excluded in the construction of 
our model. For example, even if a cover-data is not partially recomputable to 
its sender, there still can exist perfectly secure stegosystems that use external 
input of unknown distribution as its key. Such stegosystems are out of the scope 
of this paper. 

7 Conclusions 

In this paper, we generalized the steganography model described in [6] and made 
some conditions to make the model suitable for computer based steganography 
systems. Based on this model, we introduced a perfectly secure steganography 
system, one-time hash steganography system, which can take any cover-data 
that is partially recomputable to the sender. Finally, we proved that the par- 
tial recomputability of a cover-data to its sender is a necessary and sufficient 
condition for the existence of perfectly secure steganography system. 



Acknowledgments 

The author greatly appreciates Prof. Yasumasa Kanada, who awoke the author 
to the necessity of steganography. 

References 

1. H. Abelson, R. Anderson, S. Bellovin, J. Benaloh, M. Blaze, W. Diflie, J. Gilmore, 
P. Neumann, R. Rivest, J. Schiller, and B. Schneier, “The Risks of Key Recovery, 
Key Escrow, and Trusted Third-Party Encryption,” World Wide Web Journal , v.2, 
n.3, 1997, pp. 241-257 

2. Ross Anderson (Ed.), “Information Hiding,” First International Workshop IH’96 
Proceedings, Lecture Notes in Computer Science 1174, Springer, 1996 17 17 

3. Ross Anderson, “Stretching the Limits of Steganography,” Lecture Notes in Com- 
puter Science 1174, Springer, 1996, pp. 265-278 19 

4. David Aucsmith (Ed.), “Information Hiding,” Second International Workshop 
IH’98 Proceedings, Lecture Notes in Computer Science 1525, Springer, 1998 17 

5. Tuomas Aura, “Practical Invisibility in Digital Communication,” Lecture Notes in 

Computer Science 1174, pp. 265-278 19 

6. Christian Cachin, “An Information-Theoretic Model for Steganography,” Lecture 
Notes in Computer Science 1525, Springer, 1998, pp. 306-318 18, 19, 21, 27 

7. Thomas M. Cover, Joy A. Thomas, “Elements of Information Theory,” John Wiley 
& Sons. Inc, New York, 1991 22 

8. Silvio Micali, “Fair Public-Key Cryptosystems,” Advances in Cryptology — 
CRYPTO’92 Proceedings, Lecture Notes in Computer Science 740, Springer, 1993, 
pp. 113-118 

9. Neil F. Johnson and Sushil Jajodia, “Steganography: Seeing the Unseen,” IEEE 
Computer, February 1998, pp 26-34 17 



28 



Natori Shin 



10. David Kahn, “The History of Steganography,” Lecture Notes in Computer Science 

1174, Springer, 1996, pp.1-7 17 

11. Michiharu Niimi, Hideki Noda, and Eiji Kawaguchi, “An Image Embedding in Im- 
age by a Complexity Based Region Segmentation Method,” Proc. ICIP’97, Vol.3, 
pp. 74-77, (1997-10) 

12. (Collected by) Birgit Pfitzmann, “Information Hiding Terminology,” Lecture Notes 

in Computer Science 1174, Springer, 1996, pp. 347-350 18 

13. Bruce Schneier, “Applied Cryptography,” John Wiley & Sons, 1996 (second edi- 
tion) 26 

14. Gustavus J. Simmons, “The Subliminal Channel and Digital Signatures,” EURO- 
CRYPT’84, Lecture Notes in Computer Science 209, Springer, 1985 26 

15. J. Zollner, H. Federrath, H. Klimant, A. Pfitzmann, R. Piotraschke, A. Westfeld, 
G. Wicke, G. Wolf, “Modeling the Security of Steganographic Systems,” Lecture 
Notes in Computer Science 1525, Springer, 1998, pp. 344-354 19 



Steganography Secure against Cover-Stego-Attacks 



Elke Franz and Andreas Pfitzmann 

Dresden University of Technology, Department of Computer Science, 
D-01062 Dresden 

{ef 1 , pf itza}@inf . tu-dresden . de 



Abstract. Steganography aims to secretly transmit messages by embedding 
them in cover data. The usual criterion each stegosystem must meet is to resist 
stego-only-attacks. An even stronger criterion is to resist cover-stego-attacks. 
The article introduces a stego paradigm which aims to meet this stronger re- 
quirement by simulating a „usual process' 1 of data processing. The general reali- 
zation of the paradigm is discussed. One possible realization is sketched. 



1 Discussion of the Stego Paradigm 

1.1 Goal of Steganography 

One concern of privacy in everyday life is that one can communicate confidentially. 
However, modern electronic communication does not provide the same conditions as 
everyday communication. If one sends a message by email, for example, the message 
is public like on a postcard. One can prevent this by cryptography. However, there 
have been discussions by various governmental institutions about restricting cryptog- 
raphy in previous years. On the one hand, this makes life more difficult for criminals; 
on the other hand, it is an offence against privacy. 

Steganography is a way to communicate confidentially despite of all restrictions: 
The secret message is imperceptibly embedded in other harmlessly looking data. Thus, 
the mere existence of a secret message is hidden. 

The main goal of steganography is to embed data in such a way that it cannot be 
detected. Moreover, it should be possible to embed as much data as possible. 

Steganography is also used for watermarking systems. These systems embed in- 
formation about the copyright holder into digital works that are expected to be used 
without authorization. In contrast to classical steganography, the existence of an em- 
bedded watermark may be known but it must not be possible to remove this water- 
mark. 



1.2 Possible Attacks 

The discussion of possible attacks and their success is necessary to evaluate the secu- 
rity of a system. In case of steganography, the goal of the attacker is to detect infor- 
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mation hiding. In the worst case, he can extract the embedded text in all probability. 
Therefore, a stegosystem is called secure if it outputs stego data that do not even 
arouse suspicion about the presence of any embedded data. 

We assume the use of a key for both embedding and extracting. This key is only 
known to the users. The attacker must find out the key and the stegosystem used. Con- 
sequently, he could find several possibilities to extract a message. However, if he 
extracts a plausible message it is very likely that he has succeeded. As in case of a 
ciphertext-only-attack in cryptography, it is not possible to gain absolute certainty 
about the success of the attack. But in practice, a probability close to one will be suffi- 
cient. The probability to falsely accuse anybody should be reasonably small. Addi- 
tional measures like monitoring further actions of the people involved may strengthen 
suspicion. But this is beyond the scope of this paper and, therefore, will not be dis- 
cussed further. 

Power and knowledge of the attacker must be considered to describe possible at- 
tacks comprehensively. The power of the attacker can be described as follows: 

• A passive attacker is only able to analyze the data he could intercept. 

• An active attacker is allowed to modify the data. 

Steganography mainly considers passive attacks as pointed out in [1]. [2] gives exam- 
ples for active attacks. However, the discussion of active attacks is most common for 
watermarking systems [5]. 

The knowledge of the attacker describes which data and which details of the system 
such as algorithms and keys are known to him. 

Usually, one assumes that the attacker can access only the stego data, i.e. he can 
analyze or even manipulate them (stego-only-attack). However, attacks on stegosys- 
tems should be discussed as comprehensively as in cryptography, because it cannot be 
excluded that the attacker is more knowledgeable or more powerful. 

For example, one can imagine that the user of a stegosystem has not deleted the 
cover used. The cover is still stored on his computer, which is possibly not protected 
from access via the net. An attacker can try to spy out the cover stored (cover-stego- 
attack). This way, also the embedded data (emb) could be spied out if they were not 
deleted immediately (emb-stego-attack). Combined, this gives a cover-emb-stego- 
attack. 

Moreover, the attacker could be able to manipulate cover. Imagine, the attacker 
knows that as covers, the user prefers interesting images he has found on web sites. 
Now the attacker publishes some very interesting images on a site and makes this site 
known to the user. It is at least possible that the user downloads some of these images 
and uses them as a cover. This way, the attacker can try to suggest the use of images 
which make stego analysis easier. Even if the stegosystem rejects covers that are not 
suitable for embedding it will surely be possible to find covers which will not be proc- 
essed optimally and, therefore, are useful for the attacker’s analysis. 

The attacker could also try to manipulate emb in order to change its distribution. 
Suppose, emb will be compressed or encrypted before embedding. This step results in 
a special distribution of emb that may be optimal for embedding, whereas the embed- 
ding would yield suspicious stego data for covers with another distribution. The at- 
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tacker could try to manipulate the distribution by suggesting the use of another com- 
pression or decryption. Therefore, the preprocessing of emb should be part of the 
stegosystem to prevent the manipulation. This way, such an attack is no longer possi- 
ble and will not be discussed further. 




Fig. 1 . Possible attacks on a stegosystem 



Fig. 1 shows a common model of a stegosystem [6, 7] and possible attacks on the 

systems. Regarding steganography, the following attacks can be distinguished: 

Passive attacks: 

• stego-only-attack: Attacker analyzes the intercepted stego data. 

• stego*-attack: The user has repeatedly embedded in the same cover, and the at- 
tacker has intercepted the resulting stegos. Of course, this should not happen, but it 
cannot surely be excluded. 

• cover-stego-attack: In addition to the intercepted stego the attacker gets to know 
cover. 

• emb-stego-attack: The attacker knows both emb and stego. 

• cover-emb-stego-attack: The attacker knows cover, emb, and stego. 

Active attacks: 

• Manipulating stego: Stego can be manipulated to prevent the transmission of the 
embedded message. On the one hand, this attack foils the secret communication. 
On the other hand, the attacker can analyze the reaction of attacked parties: If they 
try to send possible stego data again it could be a sign that steganography is used. 

• Manipulating cover: The attacker can try to make his attack easier as described 
above. In addition to the manipulated cover he uses the intercepted stego for his 
analysis. 
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As described above, there is a variety of possible attacks. For further investigations, 
we have chosen the cover-stego-attack and describe a paradigm that resists this attack. 
It will come out that the paradigm resists the stego*-attack, too, and, moreover, even 
the cover-emb-stego-attack. 

Another classification of attacks can be found in [4], 



1.3 Introduction of the Stego Paradigm 



As shown in [7], deterministic steganography cannot resist cover-stego-attacks. Gen- 
erally, it is necessary to add indeterminism to the embedding. [7] proposes to use a set 
of possible covers (called Source ) to make the system indeterministic: The attacker 
cannot decide which of the possible covers was really used. 

This article presents another approach: The stegosystem simulates a usual process 
that modifies data. There are differences between the input and the output of the proc- 
ess which must be used to embed the secret message. Indeterminism means in this 
case that the attacker cannot decide whether the differences between intercepted data 
are caused by the stegosystem or by the simulated process (Fig. 2). The former means 
that data have been embedded, the latter means that this is not the case. 




In general, there are no restrictions on the usual process. In case of a deterministic 
process, several output data are produced by different parameters or other influences. 
An indeterministic process creates different output data even if an event is repeated 
under nearly the same conditions (the same input data, almost the same circum- 
stances). 

The simulation of a usual process provides steganography that is even resistant to 
cover-stego-attacks: Given an exact simulation of a usual process. Every cover will be 
processed by the simulation as it would be done by the original process. The input and 
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output of the stegosystem and of the usual process correspond to each other. Of 
course, the differences between the input and output of the usual process will not be 
suspicious to an attacker. Therefore, the differences caused by the simulation will not 
be suspicious, too. 




Fig. 3. Comparing possibilities to resist cover-stego-attacks 



Fig. 3 compares this paradigm to the method proposed in [7]. Even if the attacker 
knows all data outside the gray area, a cover-stego-attack will not be successful be- 
cause the processing inside this area is uncertain to him. Additionally, this is even 
valid regarding the knowledge of emb if emb and stego are stochastically independent. 
This requirement can be met by encryption of emb before embedding. As already 
mentioned, this preprocessing of emb should be part of the stegosystem. Then, the 
stegosystem can even resist cover-emb-stego-attacks. 



2 Realization of the Paradigm 

2.1 Process and Data 

A concrete process must be chosen to realize the paradigm. A deterministic process 
would be easier to simulate because its behavior is determined. But the set of possible 
outputs is limited, because plausible parameters must be used. Therefore, the possible 
outputs may not supply suitable differences for embedding data. The cover must be 
rejected in this case. This is not critical as long as the possible covers are not limited 
too much. Otherwise, the stego data would be suspicious anyway. 

An indeterministic process presents a larger set of possible output data. It is more 
likely that the differences between the input data and some of these output data are 
suitable for embedding. Of course, it is also possible that a special cover cannot be 
used. But we think that the set of possible covers will not be limited too much. Thus 
we focus on an indeterministic process. 
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As an example of an indeterministic process we discuss the scan process. There are 

some special features which must be considered: 

• The input of the scanning is an analogue image. Repeated scanning of this image 
while using the same parameters yields a set of different digital images, which is 
caused by the indeterminism of the scanning. 

• The input of the stegosystem is an already digitized image, i.e. it is a possible output 
of the scan process. Therefore, we have the special case that both input and output 
data of the stegosystem must belong to the possible output data of the scan process 
(Fig. 4). 

• This, however, presents just another possibility to realize the stegosystem: Instead of 
mimicking the differences between input and output, the stegosystem can also 
mimic the differences between possible outputs. This has the advantage, that input 
and output of the stegosystem is completely digital. 

• If the differences between cover and stego correspond to differences between im- 
ages scanned, an attacker cannot decide whether the differences were caused by the 
scan process or by the embedding. This requirement should be met for any pair of 
data the attacker can intercept, and especially it should hold even if the attacker 
could intercept more than two images. Therefore, the system is not only resistant to 
cover-stego-attacks but also to stego*-attacks. 
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Fig. 4. A stegosystem which simulates the scan process 



The problem which must be solved is to describe the scan process as exactly as 
possible. The stegosystem must process the permissible covers correctly by using this 
description: It takes the cover and generates differences which could be found between 
repeated scans while hiding the data to be embedded within these differences. If a 
cover is not suitable for embedding, the stegosystem should reject it. 

The differences between repeated scans are especially important since cover and 
stego must be possible outputs of the scan process. These differences must be mim- 
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icked by the stegosystem. A description of the scan process, however, describes possi- 
ble effects in a single image. (And thereby, possible effects in most of the images 
which can be found on computers, because they had to be digitized, too.) Neverthe- 
less, such a description will reveal possible differences between images scanned: Parts 
of the image which are digitized more indeterministically are likely to be more differ- 
ent from image to image than parts which are digitized rather deterministically. 

We have to assume that the attacker knows the scan process: The description used 
by the stegosystem to simulate the scan process must be at least as good as the de- 
scription of the scan process known to the attacker. Otherwise, the attacker could be 
able to distinguish between the two processes. 



2.2 Possible Levels of the Stegosystem 

There are several levels of generality to describe the stegosystem: 

1. The lowest level is a stegosystem that describes possible differences specifically 
for each cover to be used. Before embedding, the potential cover must be scanned 
several times. The stegosystem evaluates the differences between the digitized im- 
ages in order to derive the stego image as digital image, which might result from 
another scan. 

© Possible differences are described for each cover very exactly. 

© It is a very expensive realization. Each cover requires repeated scanning and a 
special evaluation. 

© The simulation of the scan process is very limited as only the differences be- 
tween repeated scans of one special cover are described. 

2. A more elaborate solution is given by the second level. The possible differences 
between repeated scans are described for a set of images. Each cover of the set de- 
scribed can correctly be processed by the stegosystem: The resulting stego image 
will include the data to be embedded and it will belong to the set of possible output 
data. The limitation of the sets described can be very different. Therefore, a lot of 
stegosystems are possible on this level. Limitations can concern e.g. the material 
and quality of the analogue image, the structure of the image, and the number of 
colors. 

© The more covers are permissible, the more flexible the stegosystem is. 

© It is a real simulation of the scan process, even though only for a limited set of 
covers. 

© The more covers are permissible, the more extensive the necessary description 
is. 

© It cannot be guaranteed that every cover of the set described will be processed 
correctly: Because not every single cover is evaluated, a cover may represent a 
special case. 

3. The highest level is a stegosystem that meets the theoretical ideas of the realization 
of the stego paradigm. It can correctly process any cover. 

© This stegosystem is the most flexible one. 
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© This system needs a universally valid description of all properties of the scan- 
ner, which has not been found yet. It is not even clear whether such a descrip- 
tion can be found or if an attacker can still find covers which are not covered by 
the description, respectively. Therefore, it is not yet clear whether a stegosystem 
of the third level is feasible at all. 

A stegosystem of the first level is surely feasible and maybe it is the first step to de- 
fine a stegosystem of the second level. We want to define a stegosystem of the second 
level: Based on a limited set of possible covers, the behavior of the scanner is to be 
described. The set contains grayscale images scanned from black and white photos. 
Further restrictions of the covers (e.g. regarding to the structure of the pictures) may 
become necessary, but they are not mentioned at this point. 



3 Description of the Scan Process 

3.1 Scanner Type and Principles 

Several types of scanners are in use. The flatbed scanner is chosen as a model because 

of its widespread use. 

The scan process can be generally described as follows: 

1 . Scanning: 

The analogue image is scanned by a CCD-line (charge coupled device). This CCD- 
line and the necessary optics are fixed on a block which will vertically be moved 
over the image while scanning. 

The analogue image is put on a thick glass pane. While scanning, the image is ex- 
posed. The image reflects the light so that it shines on the CCD elements. The 
CCDs convert the incoming light into analogue voltage. 

2. Processing inside the scanner: 

These analogue values are possibly edited by a preprocessor and then digitized. The 
scanning result can be improved by further operations, e.g. to reduce streaks (see 
3.3) or to optimize the range for conversion. Such operations are performed by the 
firmware of the scanner. 

3. Transmission: 

Finally, the digital data are transmitted to the computer. 

4. Processing outside the seamier: 

After all, the data can be modified by the computer according to the user’s wishes, e.g. 
changing the brightness or the contrast. 

A special software is necessary for scanning. Some of their tasks are to give a preview 
of the scanned image and to select the image areas to be scanned. Settings may 
also allow to choose the image type, the resolution, or the exposure. These set- 
tings can also be chosen automatically by the firmware of the scanner. 
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The precision of the scan depends on the positioning of the block with the CCD 
elements, the precision of the CCD elements (with respect to reproducibility), and the 
processing of the values scanned, especially the analogue-digital conversion. Overall, 
mechanics and electronics, but also the processing of the data are important. 

The possible horizontal resolution of the scanner is determined by the number of 
CCD elements, i.e. it is a function of the electronic. The possible vertical resolution is 
determined by the positioning of the scanning block, it is a mechanical problem. 



3.2 Indeterministic Parts of the Sean Process 

The repeated scanning of one image will result in digital images that look alike to the 
human eye. However, even if the same settings were used while scanning, the single 
pixels of the images are not identical. This is conditional on the indeterminism of the 
scan process which causes noise. The indeterminism is especially caused by the char- 
acteristic of the scanner, which consists of the following points: 

• indeterminism of analogue electronics, 

• irregularities of exposure, 

• precision of the mechanics, 

• material of the parts of the scanner (e.g. the glass pane), 

• operations performed inside the scanner, which possibly contain random decisions 
or rounding errors. 

Moreover, electronics and mechanics and with those the result of a scan can also be 
influenced by other parameters like temperature or soiling. 

It is assumed that the transmission of the data to the computer and the processing of 
the digital data are deterministic. 

Besides the described indeterministic parts, it is assumed that the single compo- 
nents of the scanner cannot be exactly the same (see Sect. 3.3). This is one reason for 
the differences between different copies of a scanner. Every scanner owns a typical 
behavior. Therefore, the exact description of the scan process, which is a requirement 
for the simulation, must be done separately for every scanner. 

Moreover, the characteristics of the components change over time. The change of a 
scanner during time (concerning its electronics, mechanics and material) is assumed to 
be a slow process. The stegosystem reflects the behavior of the scanner at a special 
time. The time interval between the creation of stego images known to a stego analyst 
is expected to be not significant. Thus, we do not need to reflect these changes. 



3.3 Effects Expected and their Influence on the Differences between Repeated 
Scans 

The sources for indeterminism mentioned can produce two problems: noise caused by 
reproducing errors ( reproducing noise ) and noise caused by calibrating errors ( cali - 




38 Elke Franz and Andreas Pfitzmann 



brating noise). The former is produced by the inherent noise of analogue devices, the 
latter results from the differences between the single components. 

As mentioned in Sect. 2.1, describing the effects of the scan process will lead to as- 
sumptions on possible differences between repeated scans. The more noisy the digi- 
talization of an analogue image, the more differences are likely between the results of 
different scans. In the following, obvious effects expected to appear in the digitized 
images and possible differences between repeated scans resulting from these effects 
are discussed. 

Influence of electronics: 

The CCD elements cause a reproducing noise which is spread over the whole im- 
age. Moreover, the noise depends on the gray to be taken (signal dependent noise with 
a constant relative error). Because the CCD elements do not scan just a point each but 
possibly overlapping areas, they could yield transition zones between light and dark 
areas instead of sharp edges. Besides this optical problem, there are also other influ- 
ences which might cause this effect, e.g. dispersion and reflection of light, the influ- 
ence of neighboring CCD elements, or hysteresis between the scans per row. 

It is not possible to produce and calibrate the elements so that they yield exactly the 
same value for the same shade of gray in the analogue image. This could produce a 
calibrating noise, which might be noticed as tiny streaks in vertical direction in the 
image scanned. Therefore, the scan direction of an image can possibly be determined: 
Comparing the gray scales of a digital image, there should be less variance inside the 
columns than inside the rows within homogenous areas. 

The different behavior of the elements may also be recognizable in the difference 
images: Elements working more noisy will yield results more scattered than elements 
working more deterministically. These different amounts of reproducing noise could 
be noticed as vertical streaks as well. 



Effects expected: 




Possible differences: 


reproducing noise which is spread over 
the whole image 


1 


(at least small) differences spread over 
the whole image 


signal dependent noise with a constant 
relative error 




stronger differences in light areas (cor- 
responds to a high signal) than in dark 
ones 


different degree of reproducing noise 
for different CCD elements 




vertical streaks within homogenous 
areas 



Influence of exposure: 

Irregularities of exposure can produce both typical effects for a special scanner 
(noise depending on the position of the analogue image while scanning) and repro- 
ducing noise. 
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Effects expected: 




Possible differences: 


noise depending on the position 




differences depending on the position 


reproducing noise 


1 


(at least small) differences spread over 
the whole image 



In fluence of mechanics: 

It is expected that the horizontal steering of the block is very exact. Minute irregu- 
larities of the steering might map the results of a single CCD element not to the same 
‘column’ of the analogue image but sometimes to the neighbored ‘columns’ instead. 

Irregularities of the positioning in vertical direction (i.e. the „stops“ for each row to 
be scanned) may slightly shift the position of the CCD line with regard to the ‘rows’ 
of the analogue image sometimes. 

Besides the steering and positioning while scanning, the block must be positioned 
again for the next scan. It must change its direction after scanning the area selected 
and move back. Resulting from these movements, there might be differences between 
the starting positions of the block, which would cause a shift of the pixels between 
repeated scans. 

The following illustrates the scale in which irregularities of the mechanics have an 
effect on the digitized image: Assumed, the scanner used has a maximum physical 
resolution of 600 dpi, i.e. 600 points per 2.54 cm. There are only 0.042 mm between 
two adjacent points. A shift of the pixels is caused even if the scanner works exactly in 
the range of a hundredth of a millimeter. 



Effects expected: 




Possible differences: 


irregularities of the horizontal steering 




differences on vertical gray edges 


irregularities of the positioning in 
vertical direction 


1 


differences on horizontal gray edges, 
maybe stronger than on vertical edges 


shift of the block while moving back 




differences on gray edges depending on 
the shift 



Influence of the surrounding: 

Soiling or signs of wear of the glass pane or the optical parts can cause typical er- 
rors for a special scanner but also a calibrating noise which is spread over the whole 
image. 

Both the results of the CCD elements and the mechanical functionality can be 
slightly affected by surroundings, like the temperature. The resulting reproducing 
noise is spread over the whole image, too. 



Effects expected: 




Possible differences: 


typical errors for a special scanner 




differences depending on the position 


noise spread over the whole image 


1 


(at least small) differences spread over 
the whole image 
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Influence of random decisions: 

While converting analogue to digital data random decisions can be necessary if 
there is a borderline case (that effects reproducing noise). 



Effects expected: 




Possible differences: 


noise spread over the whole image 


i 


(at least small) differences spread over 
the whole image 



To conclude, the following differences between repeated scans are expected: 

• differences spread over the whole image, 

• differences depending on the gray value, 

• stronger differences on gray edges, 

• vertical streaks within homogenous areas, and 

• differences depending on the position of the analogue image while scanning. 

However, we have described obvious effects and possible differences but we do not 
want to exclude the possibility that there are also other differences caused by other 
effects. 

The indeterminism of the scanning causes different kinds of noise, which will cause 
differences between repeated scans. Regarding to one image, the stegosystem must 
construct these differences while embedding. In the following we want to sketch how 
one could describe the differences. 



4 Description of the Possible Differences 

4.1 Strategies and Assumptions 

This chapter describes the main work to be done. Some first results will be represented 
to give an impression that the paradigm is feasible. 

As described above, various differences between repeated scans are possible. To 
find a description of the differences that can be used by the stegosystem, we want to 
perform two steps. First, we want to examine various scanners to answer the following 
questions: 

• Which of the differences expected can be found? 

• How strong are these differences? 

• In which way do the features of a scanner influence the differences? 

• Is it possible to derive the differences, which must be simulated, from the features 
of the scanner? 

This step will yield an overall impression of the scanners’ behavior, at best. How- 
ever, we need exact instructions how to generate the differences for the stegosystem. 

Therefore, a further step is necessary: We want to choose a special scanner in order 
to describe the differences caused by it as exactly as possible. This step includes the 
following tasks: 

• The differences, which are typical for this scanner, must be found out. 
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• Appropriate methods to describe these differences depending on the features of the 
image must be chosen. 

• The set of covers must be defined. 

• Finally, the values of the differences must be evaluated for the permissible covers. 

There are a lot of possible settings for the scan process, e.g. regarding the resolu- 
tion, the gamma correction, or the brightness. During the investigation of different 
scanners, it is necessary to try various settings in order to find out dependencies or 
differences which were not considered otherwise. However, if the scan process is to be 
described exactly, we want to use the standard settings. The main reason for this is the 
fact that the stegosystem must work plausibly, i.e. it must simulate differences be- 
tween scans which are done with plausible settings. 

To get an impression of the differences caused only by the scanner, the image is re- 
peatedly scanned without changing its position. For practice, it is also important to 
consider the case that the image was removed and later again put on the glass pane. 

Difference images were computed to show the differences between repeated scans. 
A difference image is defined as follows: Each pixel represents the absolute difference 
between the pixels of the two compared images at this position. 

These difference images were processed in order to visualize the differences. An 
inverted difference image is defined as follows: A white pixel at position (x, y ) means 
that the pixel at ( x , y ) in the first image is equal to the pixel at (x, y) in the second 
image. A black pixel, however, means that the pixels at this position are different. In a 
strengthened difference image, the meaning of a white pixel is the same. The gray 
values of the other pixels correspond to the differences at the respective position mul- 
tiplied by 10 (and, of course, subtracted from 255, which stands for white). 



4.2 Comparing Different Scanners 

Three flatbed scanners (single pass) were compared in order to get an impression of 
possible differences. Important technical data are listed in Tab. 1. In the following, a 
comparison of possible differences while using common settings is represented. 



Table 1. Some technical data of the scanners used 



Scanner 


Gray Mode 


Optical Resolution 


1) HP ScanJet 6 100C 


10 bits internal (1024 shades of gray); 
8 bits external (256 shades of gray ) 


600 dpi x 600 dpi 


2) Primax Colorado Direct / 
D600 


10 bits internal (1024 shades of gray); 
8 bits external (256 shades of gray ) 


300 dpi x 600 dpi 


3) Mustek ScanExpress 
12000P 


12 bits internal (4096 shades of gray); 
8 bits external (256 shades of gray ) 


600 dpi x 1200 dpi 



Fig. 5 shows the image that was scanned to visualize the possible differences be- 
tween repeated scans. We can represent only some of the results here. The appropriate 
scans were done with settings suggested. Especially, the automatically adjusted values 
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for brightness and contrast were not changed. It is assumed that 150 dpi are a quite 
common resolution for scanning a photograph. The gamma correction with gamma 
2.2, especially suggested for the first scanner, was used. The image was scanned twice 
without changing the position of the analogue image. This way, the differences ob- 
served were only caused by the scanner. 




Fig. 5. Image that was scanned 



The difference images, which were evaluated between two scans using the HP 
ScanJet 6100C, are shown in Fig. 6. In contrast to the expectations, there are more 
differences in dark areas than in light ones. Generally, the differences are spread over 
the whole image. One can easily recognize the vertical streaks. 

The values of the differences do not differ very much. Especially, there are no 
stronger differences on gray edges. Therefore, it can be assumed that the mechanics of 
the scanner works very exactly. 

At the moment, we cannot give an explanation for the horizontal streaks which ap- 
pear regularly in the difference image. 




a) inverted difference image b) strengthened difference image 

Fig. 6. Differences between two scans using the HP ScanJet 6100C (150 dpi, gamma 2.2). 



Possible differences between scans using the second scanner (Primax Colorado Di- 
rect) are shown in Fig. 7. The differences caused by the first and the second scanner 
are quite different. Now there are strong differences on gray edges, the cat’s silhouette 
is completely recognizable. Therefore, the mechanics of the scanner seems to work 
less exactly. Neither horizontal nor vertical streaks appear. The streaks might go down 
within the overall noise. 
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However, the differences are not spread over the whole image, within the white ar- 
eas of the image there are no differences at all. This uncovers another effect of the 
scan process: Within areas of saturation different gray values of the analogue image 
may be mapped on the same digital value. 




a) inverted difference image 




b) strengthened difference image 



Fig. 7. Differences between two scans using the Colorado Prirnax (150 dpi, gamma 2.2). 



The differences caused by the third scanner (Fig. 8) are quite similar to the differ- 
ences caused by the second one: The gray edges are clearly recognizable, within the 
white areas there are no differences (of course, areas of saturation differ from scanner 
to scanner). But in contrast to Fig. 7, there are also some vertical streaks. We are not 
absolutely sure what caused the streaks in this difference image. One possible reason 
might be that there are stronger differences between the reproducing noise of the ap- 
propriate elements than would be overloaded by the overall noise. 




a) inverted difference image 



b) strengthened difference image 



Fig. 8. Differences between two scans using the Mustek ScanExpress (150 dpi, gamma 2.2). 



To conclude, it is useful to compare different scanners if one wants to get an overall 
impression of possible differences. The results have strengthened the assumption that 
the differences between repeated scans strongly depend on the characteristics of the 
scanner: 

• The better the mechanics of the scanner, the less differences on gray edges can be 
found. 

• Saturation may cause the effect that different (light) values of the image to be 
scanned are represented by the same digital values. Inside such areas, there will 
be no differences between repeated scans. The first scanner works with an intelli- 
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gent microprocessor chip, which performs a lot of operations to improve the scan 
result [8], It can be assumed that the processing inside the other scanners is less 
successful with regard to handling the saturation. 

• Comparing more homogenous areas, the values of the differences seem to depend 
on the quality of the CCD elements. Generally, the first scanner produces less 
noise than the others. Between the scans of the second and the third scanner, there 
are more and stronger differences. 

Even if the scanner works very exactly there will still be enough space for embed- 
ding data. In general, there are indeterministic parts that cannot be removed despite of 
all elaborate technologies because they are effected by analogue processes. 

As mentioned above, it is necessary to try various settings in order to find out de- 
pendencies or differences which were not considered. For example, scanning the im- 
age with the first scanner without gamma correction results in a difference image that 
clearly shows the signal dependent noise: The most and strongest differences could be 
found within the light areas. This characteristic of the difference image is changed by 
the gamma correction because it maps the higher values, which represent the light 
grays, onto less values. This way, differences can get lost. 

Furthermore, scanning with higher resolution shows regularly looking horizontal 
streaks (using the first scanner again). Scanning an homogenous image shows that 
these streaks depend on the position of the image on the glass pane. We are not yet 
sure about the reason for this effect. 

The image was also scanned after putting it on the glass pane again. It was assumed 
that this would yield stronger differences on gray edges, especially for the first scan- 
ner. Tests confirmed this assumption. For the second scanner, the difference images 
did not differ so much because there were already differences on gray edges caused by 
irregularities of the mechanics. The third scanner has produced strong shifts between 
different scans (without moving the analogue image). 



4.3 Describing a Special Scanner 

Because this step is not yet accomplished we want only to sketch the procedure. 
According to Sect. 4.1, the differences which are typical for a scanner must be found 
out first. After that, an appropriate method for describing the differences depending on 
the image’s characteristic must be chosen. That means, it must be possible to describe 
the image in a way that it is divided according to the differences. 

Methods to describe images are known from image processing. There are both 
point processes and area processes. The latter seem to be more useful. 

Assumedly, the second scanner is to be used as a model for the stegosystem. Typi- 
cal differences caused by this scanner are strong differences on gray edges and differ- 
ences spread over the whole image except the white areas. Further evaluation of the 
difference image has shown that there is a shift in vertical direction. 
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An appropriate procedure could be to filter out the edges, connect this result with 
the direction of the change from light to dark in vertical direction, and to add noise to 
the whole image except the white areas. 

The last step is to define the set of permissible covers and to evaluate the chosen 
characteristics for these images. Of course, this requires the scanning of the images. 

Generally, the result of the analysis is something like a table that contains every 
characteristic analyzed. For these characteristics, the likelihood for possible differ- 
ences and the likelihood for possible distances between differences are given. If the 
stegosystem wants to process a cover, it tries to match all characteristics and modifies 
the cover as described. If there is any characteristic, which is not described, the cover 
is rejected. 

If one takes into consideration the great variety of features that are possible in an 
image, it seems to be very hard to define a useful description which can completely 
cover at least a group of permissible covers. Maybe it is only possible to realize a 
stegosystem of the first stage (Sect. 2.2) that generates a stego after analyzing a num- 
ber of scans of the chosen cover. 

It is very important to verify the chosen measures, even at this time. Thus, the ste- 
goparadigm and possible realizations are presented before practical results have been 
achieved. 



5 Summary 

The article discusses a possibility to generate stegosystems which are resistant to 
cover-stego-attacks. It comes out that such a system also resists cover-emb-stego- 
attacks and stego*-attacks. The method proposed is to simulate a usual process while 
embedding. To get as many suitable covers as possible, the simulation of an indeter- 
ministic process is suggested. As an example, the scan process was looked at. 

Possible differences between repeated scans were pointed out. Results of some tests 
showed that the differences depend on the characteristics of the scanner used. 

The main goal was to describe a stegosystem that is resistant to cover-stego-attacks. 
However, such a stegosystem may have additional features: Most of the pixels of an 
image must be changed, and the values of the differences are quite strong in some 
cases. That could be used to embed more data or to embed the data more robustly, 
respectively. 

Further investigations will be done to accomplish the description of the differences 
depending on the covers permitted. The continuous validation of the descriptions is 
necessary. 
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Abstract. In this paper, we study non-adaptive and adaptive steganographic 
techniques for images with low number of colors in palette image formats. We 
have introduced the concept of optimal parity assignment for the color palette 
and designed an efficient algorithm that finds the optimal parity assignment. 
The optimal parity is independent of the image histogram and depends only on 
the image palette. Thus, it can be used for increasing the security of 
steganographic techniques that embed message bits into the parity of palette 
colors. We have further developed two adaptive steganographic methods 
designed to avoid areas of uniform color and embed message bits into texture- 
rich portions of the cover image. Both techniques were tested on computer 
generated images with large areas of uniform color and with fonts on uniform 
background. No obvious artifacts were introduced by either technique. The last, 
embedding-while-dithering, technique has been designed for palette images 
obtained from true color images using color quantization and dithering. In this 
technique, both the color quantization error and the error due to message 
embedding are diffused through the image to avoid introducing artifacts 
inconsistent with the dithering algorithm. 



1 Introduction 

The purpose of steganography is to hide messages in otherwise innocent looking 
carriers. The purpose is to achieve security and privacy by masking the very presence 
of communication. Historically, the first steganographic techniques included invisible 
writing using special inks or chemicals. It was also fairly common to hide messages in 
text. By recovering the first letters from words or sentences of some innocent looking 
text, a secret message was communicated. Today, it seems natural to use binary files 
with certain degree of irrelevancy and redundancy to hide data. Digital images, 
videos, and audio tracks are ideal for this purpose. 

Each steganographic technique consists of an embedding algorithm and a detector 
function. The embedding algorithm is used to hide secret messages inside a cover (or 
carrier) document. The embedding process is usually protected by a keyword so that 
only those who posses the secret keyword can access the hidden message. The 
detector function is applied to the stego-document and returns the hidden secret 
message. For secure covert communication, it is important that by injecting a secret 
message into a cover document no detectable changes are introduced. The main goal 
is to not raise suspicion and avoid introducing statistically detectable modifications 



A. Pfitzmann (Ed.): IH'99, LNCS 1768, pp. 47-60, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 




48 Jiri Fridrich and Rui Du 



into the stego-document. The embedded information is undetectable if the image with 
the embedded message is consistent with the model of the source from which the 
cover images are drawn. We point out that the ability to detect the presence does not 
automatically imply the ability to read the hidden message. We further note that 
undetectability should not be mistaken for invisibility - a concept tied to human 
perception. At present, the formal theoretical framework for steganography similar to 
Shannon information theory is still missing. For a comprehensive treatment of this 
topic, see [1]. 

The undetectability is directly influenced by the size of the secret message and the 
format and content of the cover image. Obviously, the longer the message, the larger 
the modification of the cover image and the higher the probability that the 
modifications can be statistically detected. The choice of the cover image is also 
important. Natural photographs with 24 bits per pixel provide the best environment 
for message hiding. The redundancy of the data helps to conceal the presence of 
secret messages. Image formats that utilize color palettes provide efficient storage for 
images with limited number of colors, such as charts, computer art, or color quantized 
true color images. The palette image format GIF is recognized by all browsers and is 
widely used over the Internet. Posting a GIF file on one's web page will undoubtedly 
raise less suspicion than sending an image in the BMP format. Despite their 
usefulness and advantages, palette images provide a hostile environment for the 
steganographer. The limited number of palette colors makes the process of secure 
message hiding a difficult challenge. The most common steganographic technique - 
the least significant bit embedding (LSB) cannot be directly applied to palette images 
because too many new colors would be created. Most current steganographic 
algorithms for palette images introduce easily detectable artifacts in the palette or in 
the image data [8,9]. 

On the highest level, the typical palette image format consists of three parts: a 
header, a palette, and image data or pointers to the palette. The palette contains the 
RGB triplets of all colors that occur in the image. Secret messages can be hidden 
either in the palette itself or in the image data. Gifshuffle [10] is a program that uses 
the palette order to hide up to log 2 (256!)=210 bytes in the palette by permuting its 
entries. While this method does not change the appearance of the image, which is 
certainly an advantage, its security is weak because many image processing software 
products order the palette according to luminance, frequency of occurrence, or some 
other scalar factor. A randomly ordered palette is suspicious, which goes against the 
basic requirement of secure steganography. Also, displaying the image and resaving it 
may erase the information because the palette may be reordered. An alternative and 
perhaps more secure approach is to hide encrypted messages in the LSBs of the 
palette colors. In order to make the message readable from an image with a reordered 
palette, care needs to be taken during message embedding so that the message is 
readable at the receiving end. The common disadvantage of all techniques that embed 
message bits into the palette is a rather limited capacity independent of the image size. 

Practical methods should have capacity proportional to the image size, or the 
number of pixels. Many currently available software tools [3,4,7,10-13] decrease the 
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color depth of the GIF image to 128, 64, or 32 before the embedding starts. This way, 
when the LSBs of one, two or three color channels are perturbed, the total number of 
newly created colors will be at most 256. Thus, it will be possible to embed one, two, 
or three bits per pixel without introducing visible artifacts into the cover image. 
However, as pointed out by Johnson [8,9], the new palettes will have easily detectable 
groups of close colors. It is thus relatively easy to distinguish images with and without 
secret messages. It appears that secure schemes should not manipulate the palette but 
rather embed message bits in the image data. 

In the next section, we discuss methods that embed message bits as parities of 
colors. In Sect. 3, we define the energy of distortions due to message embedding and 
introduce the concept of optimal parity assignment that minimizes this energy. An 
efficient algorithm for the optimal parity is presented and the proof of optimality is 
given. The technique is further extended to multiple pixel embedding. It is shown that 
the optimal parity assignment is also optimal for multiple-pixel embedding. In Sect. 4, 
we study adaptive steganographic techniques. Two methods are introduced and their 
performance is tested on computer generated fractal images. A new technique for 
palette images obtained through color quantization and dithering of true-color images 
is described in Sect. 5. In this new dithering-while-embedding technique, the image 
modifications due to message embedding are diffused through the image in the same 
way as the quantization error during dithering. Finally in Sect. 6, we summarize the 
new techniques and conclude the paper by outlining future research directions. 



2 Message Hiding Using the Parity of Palette Colors 

One of the most popular message hiding schemes for palette-based images (GIF files) 
has been proposed by Machado [11]. In her method called EZ Stego, the palette is 
first sorted by luminance. In the reordered palette, neighboring palette entries are 
typically near to each other in the color space, as well. EZ Stego embeds the message 
in a binary form into the LSB of randomly chosen pointers to the palette colors. One 
can say that this method consists of three steps: parity assignment to palette colors 
(ordering the palette), random, key-dependent selection of pixels, and embedding 
message into color parities of the selected pixels. Message recovery is simply 
achieved by selecting the same pixels and collecting the LSBs of all indices to the 
ordered palette. This algorithm is based on the premise that close colors in the 
luminance-ordered palette are close in the color space. However, since luminance is a 
linear combination of three colors, occasionally colors with similar luminance values 
may be relatively far from each other. 

To alleviate this problem, Fridrich [6] has proposed to hide message bits into the 
parity bits of closest colors 1 . For the color of each pixel, into which we embed 
message bits, the closest colors in the palette are searched till a palette entry is found 
with the desired parity bit. The parity of each color could be assigned randomly or 



1 Using parity for message embedding has previously been proposed by Petitcolas [1] and 
Crandall [5], 




